Berlin Voter’s Map

On 18th September 2016, Berlin voted for the state elections. Using the library “leaflet”, the results of the state election is visualised according to the voting areas.

Click on each of the voting area to see more information on the Winner, Voting area, and Number of votes in that area.

How did Berlin vote?

Using the characteristics of voters in Berlin to plot the classification tree (rpart). This decision tree is firstly created using rpart, and then converted to a conditional inference tree using as.party function.

This shows that if a voting area is in East Berlin, and has less than 9.071% of Retiree, the Winner of that area is Gruene, with a likelihood is more than 80%. However, the tree seems rather overfitted. Let’s prune the tree, by selecting a tree size that has the least cross-validated error (“xerror”).

To understand the R-square and relative error of the tree to the number of splits, find the cross-validated error and identify the corresponding complexity parameter.

## 
## Classification tree:
## rpart(formula = var, data = df2, method = "class")
## 
## Variables actually used in tree construction:
## [1] Born_Berlin EastWest    Foreigner   Hartz4      Retired    
## 
## Root node error: 488/653 = 0.74732
## 
## n= 653 
## 
##          CP nsplit rel error  xerror     xstd
## 1  0.270492      0   1.00000 1.03074 0.022027
## 2  0.168033      1   0.72951 0.72951 0.026075
## 3  0.096311      2   0.56148 0.56148 0.025842
## 4  0.065574      3   0.46516 0.48361 0.025156
## 5  0.047131      4   0.39959 0.43238 0.024489
## 6  0.024590      5   0.35246 0.40779 0.024103
## 7  0.014344      6   0.32787 0.40984 0.024137
## 8  0.012295      8   0.29918 0.36066 0.023235
## 9  0.010246     10   0.27459 0.36270 0.023276
## 10 0.010000     11   0.26434 0.36270 0.023276

This shows that by pruning the tree and splitting it to 8 nodes, the tree has the highest R-square and the lowest relative error. Therefore prune the tree according to the least xerror and it’s related complexity parameters:

The second conditional inference tree is created directly by using the ctree function of partykit. There are some differences in both the decision trees, and a couple of differences are highlighted here:

  1. the first difference is that ctree only works when the categories are converted to factor class

  2. the second difference is that by using the default settings, the conditional inference tree created by ctree is much more detailed compared to the tree created by rpart

Credit: The voting classification tree and the voting map were inspired by the Berliner Morgenpost, along with the voting tree data as well as the sociodemographic data.

Data source of Berlin vote results are found here.